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Abstract 

Information flow (or information transfer as may be called) the widely applicable general physics 
notion can be rigorously derived from first principles, rather than axiomatically proposed as an 
ansatz. Its logical association with causality and, particularly, the most stringent one-way causality, 
if existing, is firmly substantiated and stated as a fact in proved theorems. Established in this study 
are the information flows among the components of time-discrete mappings and time-continuous 
dynamical systems, both deterministic and stochastic. They have been obtained explicitly in 
closed form, and all possess the property of causality, which reads: if a component, say x\, has 
an evolutionary law independent of Xj, then the information flow from Xj to Xi vanishes. These 
results have been put to applications with benchmark systems, such as the Kaplan-Yorke map, the 
Rossler system, the baker transformation, the Henon map, and a stochastic potential flow. Besides 
recovering the properties as expected from the respective systems, some of the applications show 
that the information flow structure underlying a complex trajectory pattern could be tractable. For 
linear systems, the resulting remarkably concise formula asserts analytically that causation implies 
correlation, while correlation does not imply causation, resolving unambiguously the long-standing 
debate over causation versus correlation. 
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I. INTRODUCTION 


Information flow, or information transfer as it may be referred to in the literature, has 
been realized as a fundamental notion in general physics. Though literally one may asso¬ 
ciate it with communication, its importance lies far beyond in that it implies causationfl]- 
[5], uncertainty propagation[6], predictability transfer[7], etc. In fact, it is the recognition 
of its causality association that has attracted enormous interest from a wide variety of 
disciplines, particularly in neuroscience[8]-[14], finance[15]-[16], climate science[17]-[18], tur¬ 
bulence research[19]-[20], network dynamics[21]-[24], and dynamical systems particular in 
the field of synchronization[25]-[31]. This recognition has been further substantiated by the 
finding that transfer entropy[5] and Granger causality[32] are equivalent (up to a factor 
2) [33]. 

Historically, many information theoretic quantities have been proposed to measure infor¬ 
mation flow, including time-delayed mutual information[34], transfer entropy[5], momentary 
information transfer[18], causation entropy[35], to name a few. Among these most notably 
is transfer entropy, which has spawned many varieties in its family, e.g., [15], [36], [11], and 
has been widely applied in different disciplines. 

A fundamental question to ask is whether information flow needs to be axiomatically 
proposed as an ansatz (as above), or it can be derived from the first principles in informa¬ 
tion theory. Naturally, one would like to minimize or avoid the use of axioms in introducing 
new concepts in order to have the material more coherent within the field to which it be¬ 
longs. In physics, “flow” or “transfer” does have definite meaning, albeit the meaning may 
differ depending on the context. One then naturally expects that the concept be rigorized. 
Indeed, as we will see soon, at least within the framework of dynamical systems, informa¬ 
tion flow/transfer can be rigorously derived from, rather than empirically or axiomatically 
proposed with, Shannon entropy. 

Another impetus regards the inference of causality. As mentioned in the beginning, in¬ 
formation flow arouses enormous interest in a wide range of fields not because of its original 
meaning in communication but because of its logical implication of causation. Whether 
the cause-effect relation underlying a system can be faithfully revealed is, therefore, the 
touchstone for a formalism of information flow. That is to say, information flow should 
be formulated with causality naturally embedded; it should, in particular, accurately re¬ 
produce a one-way causality (if existing), which is unambiguously equal to zero on one 
side. In this light, the widely used formalism namely transfer entropy is, unfortunately, 
not as satisfactory one expects. This has even led to discussions on whether the two no¬ 
tions, namely, information flow and causality , should be differentiated (e.g., [38]). Since 
it is established that Granger causality and transfer entropy are equivalent, one may first 
look at the problems from the former. Now it is well known that spurious Granger causal¬ 
ity may arise due to unobserved variables that influence the system dynamics (a problem 
identified by Granger himself)[39], due to low resolution in time[40] [41], and clue to observa¬ 
tional noise[42]. Besides, Granger explicitly excludes deterministic systems in establishing 
the causality formalism, a case that for sure is important in realistic problems. For trans¬ 
fer entropy, the issue has just been systematically examined[44], Aside from the failure in 
recovering the many preset one-way causalities, evidence has shown that sometimes it may 
even give qualitatively wrong results; see [44] and [43] for such examples. 

Realizing the limitation of transfer entropy, different alternatives have been proposed; the 
above momentary information transfer is one of these proposals. The purpose of this study is, 


2 



instead of just remedying the deficiencies of the existing formalisms, to put information flow 
the fundamental physical notion on a rigorous footing so that it is universally applicable. 
The stringent one-way causality requirement will not be just verified with certain given 
examples, but rigorously proved as theorems. 

With this faith, recently Liang and Kleeman (2005) [45] take the initiative to study the 
problem with dynamical systems. In this framework, the information source and recipient 
are abstracted as the system components, and hence the problem is converted into the 
information flow or information transfer between dynamical system components. The basic 
idea can be best illustrated with a deterministic system of two components, say, x\ and x 2 : 

dx i „ . 

— = F 1 (xi,x 2 ,t), (1) 

= F 2 (x 1 ,x 2 ,t), (2) 

where we follow the convention in physics and do not distinguish random and deterministic 
variables, which should be clear in the context. Now what we are to consider are the time 
evolutions of the marginal entropies of X\ and x 2 , denoted respectively as Hi and H 2 . Look 
at Xi, its marginal entropy evolution may be clue to X\ itself or subject to the influence of 
x 2 . This partitions the mechanisms that cause Hi to grow into two exclusive parts. That 
is to say, if we write the contribution from the former mechanism as dH*/dt and that from 
the latter as T 2 ^. u 


dH i 
dt 


dH* 

dt 


+ T 2 ^.i. 


( 3 ) 


This T 2 ^i is the very time rate of information flowing from x 2 to x\. We remark that this 
setting is rather generic, except for the requirement of differentiability for the vector held 
F = (Fi, F 2 ) T . In particular, the input-output communication problem can be cast within 
the framework by letting, for example, F 2 = F 2 (xi,t), F x = Fi((xi,t) t where Xi is the 
input/drive and x 2 the output/consequence, and the channel is represented by F 2 . 

From the above argument, the evaluation of the information how T 2 -+i may be fulfilled 
through evaluating dH^/dt. This is because that, when a dynamical system is given, the 
density evolution is known through the corresponding Liouville equation, and, accordingly, 
dHi/dt can be obtained. In [45], Liang and Kleeman prove that the joint entropy of (xi, x 2 ) 
follow a very concise law 


dH 

dt 


F(V-F), 


where E is the operator of mathematical expectation. They then argue that 


<m = F (^±\ 

dt \ dxi) 


and hence obtain the time rate of information flowing from x 2 to X\ 

dHi dH* fldFipA 

2_>1 dt dt \pi dx\ ) 


( 4 ) 

( 5 ) 

( 6 ) 
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where p\ is the marginal probability density function of X\. The thus-obtained information 
flow is asymmetric between x\ and x 2 , moreover, it possesses a property of causality, which 
reads, if the evolution of X\ does not depend on x 2 , then T 2 ^i = 0. 

The above result is later on proved[47] [48]. It is remarkable in that the stringent one-way 
causality in a system, if existing, can be stated as a proven theorem, rather than a fact 
for a formalism to verify; see [46] for a review. This result, however, is only for systems 
of dimension 2 (2D). For systems with many components, it does not work any more. We 
have endeavored to extend it to more general situations and do have obtained results for 
deterministic systems of arbitrary dimensionality which possess the property of causality. 
But, as we will see in the following section, the extension relies on an assumption that is, 
again, axiomatically proposed. This makes the resulting formalism not one fully derived from 
first principles, and as we realize later on, it does not work for multidimensional stochastic 
systems. This line of work, though with a promising start, is stuck at this point. 

In this study, we will show that the assumption can be completely removed. In a unified 
approach, the notion of information flow can be rigorously derived for both deterministic and 
stochastic systems of arbitrary dimensionality. In the following, we first briefly set up the 
framework, and show where the snag lies in the above approach. The solution is then pre¬ 
sented, and applied to derive the information flows for deterministic mappings (section III), 
continuous-time deterministic systems (section IV), stochastic mappings (section V), and 
continuous-time stochastic systems (section VI). For the purpose of demonstration, each 
section contains one ore more applications. As an important particular case, we specialize 
to do the derivation for linear systems, and the material is presented in section VII. This 
study is summarized in section VIII. 


II. THE SNAG THAT STUCK THE LI AN G-KLEEM AN FORMALISM 


The success of the Liang-Kleeman formalism is remarkable. It is, however, only for 2D 
dynamical systems. When the dimensionality exceeds 2, the resulting quantity, namely, (6), 
is not the information transfer from x 2 to X\, but the cumulant transfer to X\ from all other 
components x 2 , x 3 ,..., x n . In this sense, the use of (6) is rather limited. 

In order to extend the formalism to systems of higher dimensionality, Liang and 
Kleeman[47][48] re-interpret the term dH\fdt in the above decomposition (3), for a 2D 
system, as the evolution of H\ with the effect of x 2 excluded. More specifically, it is the 
evolution of H\ with x 2 instantaneously frozen as a parameter at time t. To avoid confusing 
with dH*/dt , denote it as dH^/dt, where the subscript ^ signifies that x 2 is frozen, or that 
its effect is removed. With this, the disjoint decomposition (3) is re-stated as 


d.H x 

dt 


dH \^ 
dt 


+ T 2 ^i- 


(7) 


Note this decomposition, albeit seemingly with only a change of symbol, is actually fun¬ 
damentally different from (3) in physical meaning; it now holds for systems of arbitrary 
dimensionality. The information flow is, therefore, 


dHi d II ]2 

dt dt 


( 8 ) 


Of course, the key is how to find —jp- In [47] and [48], Liang and Kleeman start with 
discrete mappings, and then take the limit as the time stepsize goes to zero. To illustrate, 
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let $ : M n —» M n be a mapping taking x(r) to x(r + 1), from time step r to t + 1. 
Correspondingly there is another mapping g? : L 1 (M n ) —> L 1 (M ra ) that steers its density p 
forward. This mapping is called a Frobenius-Perron operator; we will refer it to F-P operator 
henceforth. Loosely speaking, g? is, for any u C M n , such that[49] 


^p(x)dx = 




p(x)dx. 


( 9 ) 


When the sample space is in a Cartesian product form, as is in this case (M n ), the operator 
can be evaluated. Let a = (cp, a 2 , a n ) be some constant point, and a; = [cp, X\\ x [a 2 , x 2 ] x 
... x [a n ,a; n ]. It has be established that (e.g., [49]) 


J^p(x) 




dx n ...dx 2 dx 1 i( w) 


P(£l, 6) •.■;€n)d€ld€2-~d£n- 


( 10 ) 


For convenience, a is usually taken to be the origin. Furthermore, if $ is nonsingular and 
invertible, then 8P can be explicitly written out 


gg p(x) = p [<L -1 (x)] • \J~ l 


( 11 ) 


where J is the Jacobian of 4>. 

As the F-P operator carries p forth from time step r to r + 1, accordingly the entropies 
H, Hi, and H- 2 are also steered forward. On [r, r + 1], let H\ be incremented by AiJi. By 
the foregoing argument, the evolution of Hi can be decomposed into two exclusive parts, 
namely, the information flow from x 2 , T 2 _>i, and the evolution with the effect of x 2 excluded, 
A Hi^. Hence, for discrete mappings, we have the following counterpart of (8): 


T 2 ^i = A Hi - A H n . 


( 12 ) 


Liang and Kleeman derive the information flow for the continuous system from the dis¬ 
crete mapping. So the whole procedure relies on how 


AHi^ = Hi^t + 1)-Hi(t) 

is evaluated, or, more specifically, how Hi^(t + 1) is evaluated (since LJi(r) is known). To 
see where lies its difficulty, first notice that 


Hi(t T 1) = - [{g»p)i(xi)\og(g»p)i(xi)dxi 

J K 


which is the mean of — log(^p)!(a;i). Given <L, g? can be found in the way as shown above, 
so Hi{t + 1) is known. For Hiy, however, things are much more difficult; — log(<^p)i(;ri) 
involves not only the random variable Xi{r + 1), but also x 2 {r) (embedded in the subscript 
^). What is the joint density of (x 2 (t), xi(t + 1))? We do not know. In [47] and [48], an 
approximation was proposed, which gives 

Hi%(t + 1) = - / (^V)i(pi) log(^p)i(r/i) • p(x 2 1 Xi,x 3 x n ) ■ p 3 ...n{x 3 ,-,x n ) dyidx 2 dx 3 ...dx, 
Jn 

where yi is employed to signify xi (r + 1) and the symbol X\ is reserved for xi (r). This is 
a natural extension of what the authors use in the their original study[45] for 2D discrete 
mappings. A central approximation is that they use 

p(x 2 \xi,X 3 , ...,X n ) ■ {g*'%p)l(yi)p3...n(x3i <~,X n ) 
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to represent the joint pdf of y\ and X 2 (and X 3 ,...x n ) (think about p(x 2 \xi)p(xi) = p(x 1 , £ 2 )). 
This is, however, only an approximation, since we really don’t know what the joint pdf of 
(?/i, X 2 ) is. As we will see soon, though the resulting formalism verifies dH^/dt = dH^/dt 
for 2D systems and the zero-causality property for one-way causal deterministic systems, 
when stochasticity gets in, the causality property cannot be recovered this way. 


III. DETERMINISTIC MAPPING 
A. Derivation 

Fortunately, the issue that stuck the Liang-Kleeman formalism can be fixed; we actually 
can get the entropy without appealing to the joint probability density function of (yi,x 2 ), 
i.e., that of (x \(r + l),X 2 (r)) as mentioned above. Consider a mapping 

$ : ft -A ft, x(t) x(t + 1 ) = ($l(x), $ 2 )x),..., $„(x)) , 

where D is the sample space (M n in particular). Let ^ : D —» be an arbitrary differentiable 
function of x. We have the following theorem: 

Theorem III.l 


£ , '0(x(r + 1)) = £^(<£>(x(t))). (13) 

Remark 1: The expectation operator E on the right hand side applies to a function of 
x(r); it is thence with respect to p(r). Differently, the left hand side E is with respect to 
p(r + 1) = £Pp, where & is the F-P operator as introduced in (9). 

Remark 2: This equality is important in that one actually can obtain the expectation of 
■0(x(t + 1 )) without evaluating 2 ? p. 

Proof. The following proof is in the framework of Riemann-Stieltjes integration. A more 
general proof in terms of Lebesgue theory is also possible but is unnecessary, since the 
functions and vector fields we are dealing with in this study are assumed to differentiable. 

Let {04,04, be a partitioning of the sample space fl The elements are mutually 

exclusive and D = Uj! = 1 c< 4 . To make it simple, assume that these 04 ’s have the same diameter 
(the maximal distance between any two points in 04.). For clarity, write x(r + 1) as y, while 
x is reserved for x(r). Then 


£ty(x(r+ 1)) = / ^p(y)^(y)dy 
Jn 

n „ n „ 

= lim V / &p{y)^(y)dy = lim Y^y/t) / &p(y)dy, 

n—> 00 *—' / n—> oo ‘ / 

U— 1 'J 1 J Wh 


k=\ 


where y*. G c 4 is some point in 04 . The existence of the Riemann integral 22p(y) , ij>(y)dy 
assures that it can be any point in 04 as n goes to infinity, while the resulting integral is the 
same. Now by (9), 


^p{y)dy = 


'Uk 




p(x)dx. 
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So the above becomes 


£Y’(x(t + 1)) = lim V'^(yfc) / ^p{y)dy 

n^-oo ' ^ / 

fc=l 


lim Y"'0(y* 

n—>■ oo z ' 
k=l 

n „ 

lim / 

™ k=1 J*-' 


(wfc) 


/ p(x)dx 
p(x)-0(<P(x))<ix. 


Notice, for <h : fl —)■ hi, fl = Lhca*,, it must be that Ufc < h _1 (a;fc) = fl So the limit converges 
to f Q p(x)-0(<f>(x))(ix. That is to say, F/0(x(r + 1)) = Ei/j(<&(x(t))). □ 

The equality (13) actually can be utilized to derive the F-P operator. We look at the 
particular case when <P is invertible. By definition, Eq. (13) means 


/ 0(x)p(r + l,x)dx = / 0(<l>(x))p(r, x)dx. 

Jn Jn 

If <h is invertible, the right hand side is 0(y) • p (r, < f >_1 (y)) | J” 1 ! dy by transformation of 
variables. Since -0 is arbitrary, we have 

&P = p(t + 1,x) = p(r, $ _1 (x)) • | J r ~ 1 | , 

which is precisely the Frobenius-Perron operator (11). 

The above equality provides us a convenient and accurate way to evaluate Hi{t + 1) and 
Hi^(t + 1). Picking *0 as (log^p)! and log(l^p)i, we obtain, respectively, the following 
formulas: 

Corollary III. 1 


Hi{r + 1) = -Elog^pM^x)), (14) 

H^(t+1) =-Elog(^p)i($i(x)). (15) 

In these formulas, both the expectations are taken with respect to p(aq, x 2 , ■■■x n ), i.e., the 
pdf at time step r. In (15), we do not need to care about the joint pdf p(y,x 2 ) any more. 
The information flow from x 2 to X\ is, therefore, 

Theorem III.2 


T 2 ^ 1 = ^log(^p)!($i(x)) - Elog(^p)i(<h 1 (x)). (16) 

Proof. 

r 2 _! = AHi - A H n = (Hi(r + 1) - H^r)) - (H^(t + 1) - ifi(r)) = H^r + 1) - H^r + 1). 

Substitute into the above formulas for Hi and H ^ and (16) follows. 

Note that the evaluation of T^p and £?p generally depends on the system in question. 

But when <F and <IA are invertible, the information flow can be found explicitly in a closed 
form. 
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B. Properties 


Theorem III. 3 For 2D systems, */$i is invertible, then 

AH 1)i = H 1)i {T + l)-H 1 {T) = Elog\J 1 \. 

Remark: This is the analog of (5) for discrete-time systems[45]. 

Proof. Let x(r + 1) = y. For a 2D system, and if $1 is invertible, we have 

(^V)i(y) = M^r 1 ^)) • l^r 1 !, 


which gives 


= -E x \og [ Pl • iJf 1 !] 

= -e log [pi(xi)) • 1 j^i] 

= -E'logpi^i)+ £'log|J 1 |. 


To avoid confusion, we use E x to indicate that the expectation is with respect to x when 
mixed variables x and y appear simultaneously. Note here x\ = T > —1 1 ( 2 / 1 ) since this is a ID 
system after X 2 is frozen. Thus 

AH = E log \J { \. 


□ 


Theorem III. 4 (Troperty of causality,) //$! is independent of x 2 , then T 2 ->\ = 0. 
Proof. By the definition of the F-P operator, 


{^p)i(xi)dxi = 


' UJ\ 


' UJ\ ) 


' xR n_2 ) 


^p(x 1 , 2 : 3 , ...,x n )dxidx 3 ...dx n 
p\{xI,x 3 ,..., x n )dxidx 3 ...dx n 


for any C R. Note 

x R n " 2 ) = x R"" 2 ). 

That is to say, 


(^p)i{x 1 )dx 1 = 






dx 1 / ...,x n )dx 3 ...dx r 


pAxDdxi = 


pAxDdxi 



since $1 (hence <f> 1 1 ) is independent of x 2 . On the other hand, 


p) 1 (x 1 )dxi = 


<^p(x)dx 


'un xE n 


' ^> _1 (cji xR n_1 ) 


p(x)dx 

p(x)dx 


f dx i [ p(x)dx 2 ...dx r 




= / pi(xi)dx\. 

J — 

So f cvi (^ > ^p) 1 (x 1 )dx 1 = f u)i ('0*p) 1 (x 1 )dx 1 , \/ui C M, and hence (7P^p)i 
fore, 

E\og(3»p)i(xi) = £’log(^p)i(xi), 

and 

T 2 ^i = F/i(r + 1) - ifi*(T + 1) = 0. 

□ 


(^p)i. There- 


C. Application—Kaplan-Yorke map 


Once a dynamical system is specihed, in principle the information flow can be obtained. 
This subsection presents an application with a discrete-time dynamical system, the Kaplan- 
Yorke map [50], that exhibits chaotic behavior. 

The Kaplan-Yorke map is defined as a mapping $ = ($i, <f> 2 ) : [0,1] x M —> [0,1] x M, 
{x 1 ,x 2 ) H- (pi,p 2 ), such that 

yi = &i(xi,x 2 ) = 2xi modi, (17) 

y 2 = <h 2 (^i, x 2 ) = ax 2 + cos( 47 nri). (18) 


A typical trajectory for a = 0.2 is plotted in Fig. 1. We now compute the information flows 
between the two components. 

First we need to find the F-P operator £Pp(yi,y 2 ). Pick a domain u = [0,pi] x [0,p 2 ]. 

By (10) 


&p(yi,V 2 ) = 


d 2 


dy 2 dyi i (w) 
so the key is the hireling of < F _ 1 (u;). Since 




Vi = 


2 xi, 0 < x\ < |, 

2 x\ — 1 , x\ > \ 


it is easy to obtain 


([o,yi]) = 




u 


i i + i/i 
2 ’ 2 


(19) 


( 20 ) 
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FIG. 1: The attractor of the Kaplan-Yorke map (17)-(18) with a = 0,2. To avoid the round-off 
error in the computation which will quickly lead to a zero xq, we let b = 9722377, and instead 
compute a n +i = 2 a n mod b, xgn+i = a n /b , x 2 ,n+i = ax 2 ,n + cos(47rxi )n ). The trajectory is 
initialized with x\ = 7722377 /b, x 2 = 0. (The initial points outside the attractor are not shown.) 


Given y 1 , x\ may be either yi/2 or (1 + yi)/2, but either way, cos(47txi) = cos(27t?/i). Thus 

cos 27 ryi y 2 — cos 27 ryi 


® 2 1 ({yi} x [ 0 . 1 / 2 ]) = 
Eq. (19) is, therefore, 


a 


a 


( 21 ) 


&p{yii 2 / 2 ) = 


d 2 


r-^l /2 


, y 2~ cos 2-7T3/1 


<9t/2% 


d£i 


1 do 


a 2 


-(i+yi )/2 


cos 27ry^ 
a: 

^ y2^cos27ry^ 


p(£i. £ 2)^62 


dy 2 dy 


1 J 1/2 




5 27Tj/^ 


p(£i) £ 2)^62 


1 

2ct 

1 

1 — 
a 


Iff y 2 - COS 27T7/1 ^ + ^ 1 + 1/1 7/2 - cos 2 ttt/i ^ 


2 ’ 


a 




a 




™/2 3 / y 2 - cos 2 7 Ty 1 

, +7+ 1 '- 


a 


„ , [ {1+Vl)/2 d ( t y 2 - cos 2 ttt/i , 

6 + L W/ V" : 1 ?1 


a 


To compute T 2 _>.i, freeze x 2 . The resulting mapping <h^. is the dyadic mapping in the x\ 
direction. As above, 




[ 0,^1 

u 

[1 1 + 1 / 1 ] 

L 2 J 


L 2 ’ 2 


which gives 


^V(i /0 = 


d 


dyi J ®- 1 ([ 0 , 2 / 1 ]) 


pi(£i)^£i 


Pl ( TT ) + Pi 


1 + 1/1 
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On the other hand, 

(^)i(yi) = [ ^p(yi,y2)dy 2 

Jr 

1 , 1 / 1 + yi\ , 

" 2 Pl (~ 2 ) + 2 Pl{ ~ 2 ~ ) + 


ry l /2 ^ 


( 1 + 1 / 0/2 Q 


T\ -Pi / 77-Pl(£l)^£l 


dy 


' 1/2 


<9yi 


Pi^yJ+P 1 


1 + yi 


So 


T- 2 ^i = E logi(yi) - £log (&p) 1 ( 2 / 1 ) = 0, 


( 22 ) 


just as one would expect based on the independence of $1 on x 2 . This serves as a validation 
of Theorem 111.4. 

To compute Tj_> 2 , notice 


«^([ 0 ,!/ 2 ]) = 


cos 47nci 1/2 — cos 47nri 


a 


a 


The corresponding F-P operator is such that 

J/2 —COS 47TX} 




P2^2)d^2 ~ ~P2 


a 


y 2 — cos Al'kxx 


a 


= -P2O0), 


a 


which makes sense, considering that, when x\ is frozen, y 2 is just a translation followed by 
a rescaling of x 2 . On the other hand, the marginal density 


(^v) 2 (y 2 ) = / ^V(yi,y 2 )dyi 


1 

2 a 


P 


'o L 

rl 


1 0 

yi y 2 - cos 


2 vyi\ 


+ P 


1 + yi 1)2 - COS 


H— / dy ! 


a 


2 ’ a 7 

r i/2 <9 / 

Jo dy/ V lj a 


2 vyi\ 


a 


J 


dyi 


IJ2 - cos 2vn/i 


r(i+yi )/2 ^ 

d{i+ L wj 1 61 


y 2 - cos 27ry, 


a 




Because of the intertwined y\ and y 2 , these integrals cannot be explicitly evaluated without 
specifications of p. But when p is given, it is a straightforward exercise to compute 


-E log(^ p) 2 {y 2 ) = 



log (^ 0)2 ($2 (an, x 2 ))p(x 1 ,x 2 )dx 1 dx 2 . 


Denote it by H 2 . Then 


Ti^ 2 = E\og(^p) 2 (^ 2 (xi,x 2 )) - E\og(0 s p) 2 (^ 2 (x l ,x 2 )) 



a 


p 2 (x 2 )p(x 1 , x 2 )dx 1 dx 2 + H -2 


= H 2 - H 2 /a. 


(23) 


Generally this does not vanish. That is to say, within the Kaplan-Yorke map, there exists a 
one-way information flow from X\ to x 2 . 
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D. Applications—The baker transformation and Henon map revisited 

Since its establishment, the Liang-Kleeman formalism has been applied to a variety of 
dynamical system problems. Hereafter we will re-study some benchmark examples and see 
whether the results are different. In this subsection we look at the baker transformation and 
Henon map. 


1. Baker transformation 


The baker transformation is an extensively studied prototype of area-conserving chaotic 
maps that has been used to model the diffusion process in real physical world. It mimicks 
the kneading of dough: first the dough is compressed, then cut in half; the two halves are 
stacked on one-another, compressed, and so forth; see Fig. 2 for an illustration. In formal 
language, it is $ : £2 —> £2, Q = [0,1] x [0,1] being a unit square, 


$(xi,x 2 ) 


( 2 ^ 1 , f), 

(2xi - 1, \x 2 + |), 


It is invertible, and the inverse is 


1 (x 1 ,x 2 ) 


(f, 2 x 2 ), 

(£i±i, 2 x 2 - 1), 


0 < xi < |, 0 < x 2 < 1, 
\ < < 1, 0 < x 2 < 1. 


0 < x 2 < i, 0 < xi < 1, 
\ < x 2 < 1 , 0 < xi < 1 . 


Thus the F-P operator & can be easily found 


^p(xi,x 2 ) = p [4 1 (x I ,x 2 )] • |J 1 


• 0 <x 2 < 5 , 

p{^jX, 2^2-1), I < x 2 < 1 . 


(24) 


(25) 


(26) 


We now use the above theorem to compute X^i. Integrating (26) with respect to x 2 , 


(^p)i(xi) = 


” 1/2 


'0 L 


P( y. 20 : 2 ) dx 2 + p C l + l 

(x 1 \ (x i + l 

p( Y ,x 2 )+pl^,x 2 


, 2a; 2 - 1) dx 2 


dx 9 


p. (f)+pi 


Xi + 1 


(27) 


When x 2 is frozen as a parameter, the baker transformation (24) becomes a dyadic mapping 
in x\ direction, i.e., a mapping $1 : [0,1] —* [0,1], 

®i(xi) = 2a;i (mod 1). 

For any 0 < X\ < 1, The counterimage of [0, aq] is 


S - 1 ([ 0 ,a;i]) 



1 1 + Xi 

2 ’ 2 
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So 




— [ 

dxi i([ 0 ,xi]) 
q r* i / 2 

< 9 xi Jo 

1 T /- x 'i 

2 P 


p(s) ds 


Q Al+xi )/2 

dXi A/2 


p(s) (is + 7^- / p(s) (is 


)+P 


1 + Xi 


Thus 


(^V)i(Ti) = (^p)i(xi). 

By the above theorem, 

T 2 ^i = £log(^fy>)i($i(x)) - £Tog(^pi($i(x)) = 0 . 

To compute Ti^. 2 , observe 

(^rt 2 fe)= f ^p( Xl , x^dx, = {{»/( 

io l Jo P 2 x 2 - 1) axi, 2 < < 1 - 

By Proposition ??, 

iP 2 (r + l) = -S x log(^p) 2 ($ 2 (xi,x 2 )) 

= ~ J o P2M log p(^,x 2 )dXj dx 2 

,1/2 


( 28 ) 


" 1/2 


[ p 2 (^ 2 ) log ( [ p{ X ~J 1 ,x 2 )d\ ) fix 2 
'1/2 Vio 2 


p 2 (x 2 ) log (2I p(Z,x 2 )dA dx 2 - [ p 2 {x 2 )log (2 [ p(£,x 2 )dAdx 
\ Jo I J1/2 \ J1/2 J 


" 1/2 


,1/2 


= — log 2 — / p 2 (x 2 )log / p(^,x 2 )f/M cix 2 - / p 2 (x 2 )log / p(f,a 2 )d£ dx 2 , 


'1/2 


'1/2 


so 


Ai/ 2 = // 2 (r — 1)*- // 2 (r) 

r l/2 


= -log 2 


(r 


p 2 (x 2 )log / p(£,x 2 )d€\dx 2 ~ p2{x2)\og[ / p(£,x 2 )o?^ dx 2 


1 /■! 



p(^l, £2) 

'0 Xo 
= — log 2 + (/ + J), 


V 

log ( [ p(A, x 2 )d\ 


' 1/2 


' 1/2 


dx\dx 2 


where 


,1/2 


/ = 


J = 


'1/2 




pM ■ 


log 


fj p(X,x 2 )dX 


f 0 1/2 p(X,x 2 )dXj 


log^T 


Jo p(A, x 2 )ciA 


/i/2 P(A) x 2 )dX 


dx 2 , 

( 29 ) 

dx 2 . 

( 30 ) 


13 



To compute H 2 \, notice that, when x\ is frozen, the transformation is invertible; moreover, 
the Jacobian J 2 = 1/2 is a constant. By Theorem III.3, 


A H 2% 



-log 2 . 


(31) 


which gives 


IU 2 — A H 2 — /\H 2 ^ — / + E. 


(32) 


It is easy to show that / + E >0. In fact, obviously / + E is nonnegative; besides, the two 
brackets cannot be zero simultaneously, so it cannot be zero. Hence T ^ 2 is strictly positive; 
that is to say, there is always information flowing from the abscissa to the ordinate. 

To summarize, T 2 _yi = 0, 7 \_> 2 = / + II >0. These results are precisely the same as those 
obtained before in [45] and [47]. That is to say, for the baker transformation, the current 
formalism shows no difference from the previous one based on heuristic arguments and with 
approximations. 





FIG. 2: A schematic of the unidirectional information flow from the abscissa to the ordinate upon 
applying the baker transformation. 


2. Henon map 

The Henon map is a mapping $ = (<f>i, $ 2 ) : K 2 (->■ R 2 defined such that 

f $i(xi,z 2 ) = 1 + x 2 -axl, /oo\ 

\$ 2 (x 1 ,x 2 ) = bx 1 , 

with a > 0, b > 0. The case with parameters a = 1.4 and b = 0.3 is called a “canonical 
Henon map,” whose attractor is shown in Fig. 3. 

It is easy to see that the Henon map is invertible; its inverse is 

<F _ 1 (Xl,X 2 ) = (y, X\ 1 T . (34) 

The F-P operator thus can be easily found from (11): 

& f p{x 1 ,x 2 ) = p($ _1 (a;i,a; 2 ))| J _1 | 
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FIG. 3: A typical trajectory of the canonical Henon map (a = 1.4, b = 0.3). 


= x i-l + ^ 2 )• (35) 

In the following we compute the flows between the quadratic component X\ and the linear 
component x 2 . 

Look at ^~ 2 —so first. By (16), we need to find the marginal density of X\ at step r + 1 with 
and without the effect of x 2 , i.e., (^p)i and p) i^. From (35), p)\ is 


(^p)i(xi) 


^p(xi,x 2 ) dx 2 


' P 


x 2 | a 2 

T’ x '~ 1+ r 2 




p(?/, xi — 1 + a?/ 2 ) dp. 


(x 2 /b = rj) 


If a = 0, this would give p 2 {x\ — 1), i.e., the marginal pdf of x 2 with argument x\ — 1. But 
here a > 0, the integration is taken along a parabolic curve rather than a straight line. Still 
the final result will be related to the marginal density of x 2 , for notational simplicity, write 

i&p) i{xi) = h{xi)- (36) 

To find (<^p)i, use y\ to denote 

<Fi(xi) = 1+ x 2 - ax \, 


following our convention to distinguish variables at different steps. Modify the system so 
that x 2 is now a parameter. As before, we need to find the counterimage of (—oo, 2 / 1 ] under 
the transformation with x 2 frozen: 


$ 1 1 ((-c», 2 /i]) = (-oo, \/(l T x 2 -yi)/a U y/(l + x 2 -yi)/a, oc) 
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Therefore, 


d 


<*W> W = 


f pi(s) rfs 

d f-W+’K-Vi)/* d 

Pi(s) ds + — 


dy i y_ c 


Pl(s) (is 


2 a/«(1 + x 2 - pi) 

1 


^ 2 /i «/( 1 +X 2 - 2 / 1 )/' 

Pi l -a/( x + ^2 - 2/1)/a) + Pi (V(l + x 2 ~Vi)/ a 


2 a|xi| 


[Pi(-^i) + P 1 O 1 )] • 


(pi < 1 + x 2 ) 
(recall yi = 1 + X 2 — axf) 


Denote the average of pi(— x\) and pi(xi) as pi(aii) to make an even function of x\. Then 
<£^p) 1 is simply 


(^V) 1 ( 2 / 1 ) = 


PiQp) 

a\x\\ 


(37) 


Note that the parameter x 2 does not appear in the arguments. Substitute all the above into 
(16) to get 


T 2 ^i = Elog(^p) 1(2/!) - ^log(^p)i(pi) 

= Elog ^pp- — E logp 2 (l + x 2 -ax\) 
a\xi\ 

= Elogpi(xi) - Elog |aa:i| - Elogp 2 (l + x 2 - ax\). 


Comparing this to the result in [47], except for the term — Elog|aa;i|, all other terms are 
different. 

Next consider 7 i_^ 2 . From (35), the marginal density of x 2 at r + 1 is 


Thus 


(^p) 2 (x 2 ) 


&p(xi,x 2 ) dx 


1 



Xi — 1 + a 



1 

b 


p{y,t) d£ 



dx 1 


H 2 = -E(0>p) 2 (y 2 ) 



= H 1 + log b. 


The evaluation of H 2 \ is much easier. As X\ is frozen as a parameter, y 2 becomes definite. 
In this case, the 2D random variable is degenerated to a ID variable. Correspondingly 
becomes a pdf in x\ only. So 

(^V)2 = [ &\pdx 1 = 1 . 

Jm. 
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Thus H 2 \ = — E \og{0 > ^p) 2 = 0 . By (16), the information flow from X\ to x 2 is, therefore, 

Ti ^2 = H 2 — H 2 ^ = Hi + log b. (38) 

In other words, the flow from X\ to x 2 is equal to the marginal entropy of x,\ , modified by 
an amount related to the factor b. Particularly, when b — 1, Ti _> 2 = Hi. This is precisely 
the same as what is obtained before in [47]. 

In a summary, the information flows within the baker transformation are precisely the 
same as we have obtained before in [47]. For the Henon map, the flow from xi to x 2 , has 
recovered the benchmark result based on physical grounds. But T 2 ^i is generally different 
from that in [47]. 

IV. CONTINUOUS-TIME DETERMINISTIC SYSTEMS 
A. Deriving the information flow 

Now look at the information flow within the continuous system, the 2D version of which 
motivates this line of work: 


or, in vectorial form, 


dx i 

dt 
dx 2 

dt 


dx n 

dt 


Fi(t-xi,x 2 ,..., x n ), 

(39) 

F 2 (t-xi,x 2 ,..., x n ), 

(40) 


(41) 

F n (t\xi,x 2 , ...,x n ), 

(42) 


dx 

dt 


F (i;x). 


(43) 


Consider a time interval [t,t + At], Following [48], we discretize the ordinary differential 
equation and construct a mapping $ : M n —> M n , x(f) i— > x(t +At) = x + FAf. Correspond¬ 
ingly there is a Frobenius-Perron operator & : L 1 (M ra ) —> L 1 (M n ), p(t) H > p(t + At). Write 
x(f + At) as y, a convention we have been using all the time to avoid confusion. Then the 
mapping <f> : x H > y is such that 


yi = xi + Fi(xi,x 2 , ...,x n )At, 

y 2 = x 2 + F 2 (xi,x 2 , ...,x n )At, 

. . (44) 

Un = %n + F n (x i, X 2 , ..., X n )At. 


Its Jacobian is 



®Fi 

= 1 +E si Ai +°< Ai ) 

= 1 + V • FAt + o(At). 


( 45 ) 


As At —> 0, J —> 1 7 ^ 0, so $ thus constructed is always invertible for At small enough. 
Moreover, it is easy to obtain the inverse mapping 


<f > _1 : x = y - FAt + o(At) 


(46) 


and J 1 = 1 — V • FAt + o(At). So 


r-ll 


&p{ y) = p($ (y)) • \J 

= p( y - FAt) • (1 - V ■ FAt) + o(At) 

= p(y) — Vp • FAt — pV • FAt + o(At) 
= P(y) - V • (pF)At + o(At). 


As a verification, check 


dp = hm gy(x)-p(x) = _ 

dt At^o At K ' 


This yields the Liouville equation ^ + V • (pF) = 0, as is expected. 
With A?p we now can compute the marginal density 

(^)i(t/i) = Pi(t/i) - At [ t ^^-dy 2 ...dy n + o(At) 

i-i dy i 


J R n 


which gives 

log(t^p)! (r/i) = — log pi(pi) — 
= -log pi ( 2/0 + 


log 

At 


At f dpFi 

1 -/ ——d 2 / 2 ...dy„ + o(At) 


Pi(t/i) J 

= — log pi(xi + FiAt) + 


Pi J R«-! dy i 
<9pFi 

-v— dy 2 ...dy n + o(At) 

in -1 OT/l 

At 


Pi(^i) 


<9pFi 

—— ax 2 ...dx n + o(At). 

-i OX\ 


At the last step, the p's have been replaced by x'.s in the integral term. This is legitimate 
since the difference goes to higher order terms. 

We now evaluate the marginal entropy increase at t + At. The following is the key step: 
Take expectation on both sides, the left hand side with respect to (^p)i(pi), while the right 
hand side with respect to pi(xi). This yields 


HUt + At) = — E log pi hr i + -FAt) + AtF ( — [ ^ 1 dx 2 ...dx n ) + o(At). 

VPi J r»-i ox i / 

= Fi(t) - E^^-FiAt + At [ pi—dxi [ ^ -dx 2 ...dx n + o(At). 

ox i pi J R n —i oxi 

Note the third term on the right hand side vanishes after integration with respect to X\ cine 
to the compactness of the functions. So 

H { [t + At) = H x {t) - AtE ( F i dl ^ Pl ) + o(At), 
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and hence 


m = hm = _ £ A AlA) . (47 ) 

dt At->o At \ dx i J 

This is precisely the same as that either from the F-P operator [47] or directly from the 
Liouville equation[45], serving a validation of our approach in this study. 


When X 2 is frozen as a parameter on [t, t + At], we need to examine the modified mapping 
^ : M n_1 -)■ M n_1 

{ 2/i = xi + F 1 (xi,x 2 , ...,x n )At, 
y 3 = x 3 + F 3 (xi, x 2 , ..., x n )At, 

: : ( 48 ) 

Vn = x n + F n (x 1 , a? 2 , ..., 247 .) At, 


i.e., the mapping $ with the equation y 2 = ay+ 7*2 At removed, and x 2 frozen as a parameter. 
Again, here ay stands for ay(t), and y l for ay(t + At). For convenience, we further adopt the 
following notations: 

y$ = ( 2 /i, 2/3, ■■■■,y n ) T , 

X's? = (xi, a: 3? . ...,x n ) T , 

F ^(Fi,F 3 ,...,F n ) T 


Besides, use p^ to signify the joint density of x^, and pi^ to denote the density of ay with 
o ; 2 frozen as a parameter on [t,t + At]. Notice the fact = p\ at time t. 

It is easy to know that the Jacobian of <f>^ 


Jv = det 


5A 

x^ 


OFi 

1 + At^ — + °(At). 




(49) 


The corresponding F-P operator AA : L 1 (M n 1 —> L 1 (M" 1 is such that 


^V?(y) 


p^($ S(y)) • IV 


Ps(yi? - F^A t) ■ 



P\(y$ - V • (p^F^)At + o(At). 


Integrate with respect to (y 3 , ... ,y n ) (recall that X 2 is now a parameter) to get 

(^fy*?) 1 ( 2 / 1 ) = pMvi) - A t [ dp ^ Fl dy 3 ...dy n + o(At), (50) 

J R "-2 cry 1 

where other terms vanish due to the compactness assumed for the functions. Hence 


-log(^V^)i(yi) 


-logPi'S?( 2 / 1 ) - log 


At 

Pi'niVi) 


— log Pi's? (24 + F\At) + 


At 

PlH?(2h) 


^^-dy 3 ...dy^j + o(At) 
^ 1 dx 3 ...dx n + o(At). 

<73? x 
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Note in the integral term, the y's have been replaced by x's ; this is legitimate as the difference 
goes to the higher order terms. Since p\^{x\) = pi(x\) at t, so 


Pi\(xi + F 1 At) = piOi) + 


dpi 

dx\ 


FiAt + o(At) 


and hence 

-log(^V^)i(?/i) = - log pi - -^p-FiAt + ^ / df ^ Fl dx 3 ...dx n + o(At). 

PlOXi Pl\Xl) J OX 1 

Take expectation on both sides, the left hand side with respect to the joint probability density 
of (y\ , X2), while the right hand side with respect to (xi, X2). This is the key step that makes 
the present study fundamentally different from [48] which relies on an approximation to 
fulfill the derivation. This yields 


H^(t + At) = H t (t ) - AtE ^ / ; + 

d log pi 


= H t (t) - AtE F\ 


1 dxi 


At 

At 


Aah£il iXldX2 [ AA dX3 ,„ dXn + 0 (At) 


Pi(xi) 


dx 1 


P2\ 


1 dp^Fid^ + 0 (At), 


dx 1 


where p 2 |i is the conditional density of x 2 on x\. Thus 


dH 


A 


dt 


= —E Fi 


d log pi 
dx 1 


+ 


dp^Fi 

P 2 "^T dx - 


We therefore arrive at the following theorem: 

Theorem IV. 1 


T 2 ^i — 


dHi dll \ 2 


dt 


dt 


P2\l 


dp\>F 


dxi 


-dx = —E 


J_ f dFip^ 

Pi Jw -- 2 dx 1 


dx 3 ...dx r 


(51) 


(52) 


B. An alternative derivation 

For the continuous system 

//x 

- = F(x,f), (53) 

consider an interval [t, t + At], and a mapping 

4> : R n -A M n , x(t) x(t + At) = x(t) + FAt. 

Recall that by definition F/0(x(t + At) = f ^(x)p(a;, t +At)dx, +o(At), for any test function 
ip, and 

Eippx.pt + At) = Eippxpt) + FAt + o(At)) 

= E [ip{x(t)) + • FAt + o(At)]. 
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Note the expectation on the left hand side is with respect to p(t + At), and that on the right 
is with respect to p(t). This way we obtain the Liouville equation. 

Now let "0 be the functional log(,^p)i. When X 2 is frozen, on interval [t, t + At], there 
is a Liouville equation 


d ] dFips ? : dF-ip^ dF n p^ _ 


(54) 


dt dx\ dx 3 dx r 

for p^ the joint density of (aq, x 3 ,..., x n ). The equation for its marginal density p^ = 
f Rn -2 p^dx 3 ...dx n is, after integration with respect to (x 3 ,X 4 ,..., x n ) and with the considera¬ 
tion of the compact support assumption, 


<9pn> + <9 


dt dxi 

Divided by p^, this yields 

<91ogpi^ 


dt 


+ 


/ Fip^dx 3 ...dx n = 0. 
Jw 1 - 2 

—— ^ dx 3 ...dx n = 0 . 
-2 dx 1 


Discretizing, and noticing the fact pi\(t) = pi(t), 

r 

log pi+ Af;aq) = logpi(t;xi) - At 


1 dF x p^ 


dx 3 ...dx n + o(At), 


IRn ~2 Pl'y dX\ 

which is log(<^p)i(xi). As conventional, let x(t + At) = y and leave x for x(t) to avoid 
confusion. We actually need to find 

log(^V)i(pi) = logpi^t + At; 2/1) 

1 dF 1 p^ 


= log pi(t; x[t + At)) — At 


1-2 pi^ dxi 


-dx 3 ...dx n + o(At) 


= logp 1 (t]x) + ^z£L FiAt-At / dF ^P^ 

dx 1 J R n- 2 p dx 1 


dx 3 ...dx n + o(At). 


Taking expectation and multiplying by (-1) on both sides, we obtain 


H^(t + At) = H] (t) - AtE F] 


d log pi 
dxi 


So 


dH^ H^t + A^-H^At) 

— 1 = iim — 2 ---= E 

dt At->o At 


+ AtE 
1 <9Fip^ 


1 dFipy 

Pi <9a;i 


dx 3 ...dx r 


1-2 pi dx 1 

On the other hand, from the Liouville equation it is easy to obtain 

dH x 


dx 3 ...dx n — E ( Fi 


d log Pi 
dx 1 


Hence 


dt 


dH\ dFfi'y 

dt dt 
f , dF lP , 

= / log pi ———dx — E 


, dF ' P, 

l°g Pi~x. — dx. 

OX i 


(55) 


1 — 


(9a;i 


= —E 


1 

Pi 


dFip^ 


1 (9Fip^ 
1-2 pi dx 1 


dx 3 ...dx n + E < F\ 


d log pi 
dx: 1 


dx 3 ...dx r 


Jn-2 (9xi 

which is the same as (52) in Theorem IV. 1. 


(56) 
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C. Properties 


Theorem IV.2 For a 2D system 


we have 


= F 1 (x 1 ,x 2 ,t), 
= F 2 (x l ,x 2 ,t), 

dH n = e /dF\\ 
dt \ dxi ) ' 


(57) 


Remark: This recovers Eq. (5), the key equation originally obtained by Liang and 
Kleeman[45] through heuristic argument. Here we rigorously prove it. 

Proof. When n — 2, pv — pi, hence 

dH = E ( 1 ^ipA _ E ( 
dt \pi dxi J \ dx\ 

c r^i, p 9pi 1 r 9 log p l 

- E irF l ~xr 

dF 1 
dxi 




□ 


Theorem IV.3 Property of causality 

For the system (39)-(J/2), if F\ is independent of x 2 , then T- 2 ^i = 0. 


Proof. If F\ has no dependence on x 2 , so is F\p 2 . Thus 


T 2 ^ — —E 


1 

LPi 


dF\p^ 

dx 1 


dx-s... dx r 


p(x 2 \xi) ® dx 


dFip% 

dx\ 


dx\ 
dx\d/x :i ...dx r 


= 0 . 


where the fact f p(x 2 \xf)dx 2 = 1 and the assumption of compact support have been used. 

□ 
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D. Application—Rossler system 


In this subsection, we present an application study of the information flows within the 
Rossler system: 


dx „ . . 

* = F * = ~y ~ z ’ (58) 

^77 = Fy = x + ay, (59) 

^ = F z = b + z(x -c), (60) 

where a, b, and c are parameters. Otto E. Rossler finds a chaotic attractor for a = 0.2, 
b = 0.2, c = 5.7 ([51]), as shown in Fig. 4. From the figure the trajectories are limited 
within [-12,12] x [-14,10] x [0,25]. 



FIG. 4: The Rossler attractor. 


To calculate the information flows, one needs to obtain the joint probability density 
function p{x 1 , 0 : 2 , £ 3 ). It is, of course, obtainable through solving the Liouville equation 

dp_ dF x p dF y p dF zP 

dt dx dy dz 

with some initial condition p 0 . However, there is another way, namely, ensemble forecast, 
which is more efficient in terms of computational load. As illustrated in Fig. 5, instead of 
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solving the Liouville equation, we solve the Rossler systems initialized with an ensemble of 
initial values of x. This ensemble is formed with entries randomly drawn according to the 
initial pdf p 0 . At each time step, we count the bins thus obtained and estimate the pdf. The 
resulting pdf is the desired p. 

The Rossler system (58)-(59) is solved using the second order Runge-Kutta method with 
a time step size At = 0.01. A typical computed trajectory is plotted in Fig. 4. The initial 
conditions are randomly drawn according to a Gaussian distribution S), the mean 

vector and covariance matrix being, respectively, 


"8 


'4 0 O' 

2 

, s = 

0 4 0 

10 


0 0 4 


The initial mean values are chosen rather randomly (in reference to Fig. 4); p x is chosen 
large to make '-Hf = E(V • F) = p x — 5.5 positive. 

Pick a computation domain fl = [—16,16] x [—18,14] x [—4,28], clearly covers the at¬ 
tractor. We discretize it into 320 x 320 x 320 = 32, 768, 000 bins with Ax = Ay = Az = 0.1. 
To ensure one draw for each bin on average, in the beginning we make 32, 768, 000 random 
draws. As the ensemble scheme is carried forth, p and all other statistics can be estimated as 
a function of time. By Theorem IV. 1 the information flow rates are computed accordingly. 



FIG. 5: A schematic of ensemble prediction. Instead of solving the Liouville equation for the 
density p, we make random draws according to the initial distribution p(fo) to form an ensemble, 
then let the Rossler system steer forth each member of the ensemble. At each time step, bins are 
counted and the probability density function is accordingly estimated. 


For a system with three components (x,y,z), there are in total 6 flow pairs: T x ^. y , 
Ty^ z , T Z ^ X , T x ^. z , T y ^. x , T x ^ z . A first examination of the system tells that dy/dt does not 
depend on z and dz/dt does not depend on y. By the property of causality (Theorem IV.3), 
T z ^. y and T y ^ z must vanish. The computational results reconfirm this. In Fig. 6, the two 
are essentially zero. What makes the results surprisingly interesting is that T x ^ z is also 
insignificant, while the dependence of dz/dt on x is explicitly specified. Besides, T z ^ x is also 
small. In the figure are essentially the flows between x and y. T y ^ x and T x ^ y . 

The above information flow scenario motivates us to check the system with only x and y 

T 0 —1" 

two components. This is an amplifying harmonic oscillator % = Ax where A = 1 , 
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FIG. 6: The time series of the information flow rates within the Rossler system (in nats per unit 
time). Dashed: T y ^ x ; dotted: T x ^ y \ solid: T z ^ x . Other flows are essentially zero in this duration. 
The initial segments are not shown as some trajectories are still outside the attractor. 



t 


FIG. 7: The time series of the information flow rates within the amplifying harmonic system as 
shown in the text (in nats per unit time). 


a linear system allowing the information flow, say, from y to x, to be simply expressed 
as T y ^ x = (see below in section VII). That is to say, here the covariance matrix 

X = ((Jij) completely determines the flow. The evolution of X follows 

— = AX + XA t . 

at 


Initialized by 


4 0 
0 4 


, (Jij can be easily computed; the resulting T y ^ x and T x ^ y are shown 


in Fig. 7. Comparing to those in Fig. 6, the general trend, including the period, seems to be 


25 








similar, though the geometry of the curves has been modified from harmonic into a seesaw 
ones. Besides, the T x ^ y (T y ^ x ) is always negative (positive) for the harmonic oscillator, 
while for the Rossler system, they can be both negative and positive. Note the parameter a 
in A does not explicitly appear in the formula, but it does contributes to the generation of 
the information flow. One may easily check that, if it is zero, then ^ = 0, and hence the 
flow rates will stay zero if originally a 12 = 0. 

The above example is just used for the demonstration of application and, in some cases, 
for the validation of the proven theorems such as the property of causality. The seemingly 
vanishing T x ^, z in spite of the dependence of dz/dx on x for sure deserves further investigation 
but is beyond the scope of this study. Here we just want to mention that this does conform 
to the observations with complex systems-Emergence does not result from rules only (e.g., 
[53]-[54]). It has long been found that regular patterns may emerge out of irregular motions 
with some simple preset rules; a good example is the 2D turbulent flow in natural world 
(e.g., [55]). Clearly, these simple, rudimentary rules are not enough for explaining the causal 
efficacy and the bottom-up flow of information that leads to the emergence of the organized 
structure. As commented by Corning[56], “Rules, or laws, have no causal efficacy; they do 
not in fact generate anything... the underlying causal agencies must be separately specified.” 
We shall see a more remarkable example in the following subsection. 


E. Application—The truncated Burgers-Hopf system revisited 


Here we re-examine the Truncated Burgers-Hopf system (TBS hereafter), a chaotic system 
which seemingly has rather simple information flow structures in the studies of Liang and 
Kleeman[48]. For a detailed description of the system itself, see [58]. In this section we only 
examine the following particular case: 


dx 1 

dt 

dx 2 

dt 

dx 3 

dt 
dx 4 

dt 


= Fi(x) 


= F, 


= F» 


= F d 


As we have described before, the system 
embedded in 


V 

is 


= x x x A - X 3 X 2 , 

(61) 

= —X1X3 - X2X4 , 

(62) 

= 2X4X2, 

(63) 

= —x x + x 2 . 

(64) 


intrinsically chaotic, with a strange attractor 


[-24.8,24.6] x [-25.0,24.5] x [-22.3,21.9] x [-23.7,23.7], 

The information flow within the TBS cannot be found analytically. As before, we use the 
ensemble prediction technique to estimate the density evolution, and then evaluate the T’s. 
The setting and procedure are made precisely the same as that in [48] in order to facilitate 
a comparison. Details are referred to the original paper and will not be presented here. 
Figure 8 plots the results for the case with a Gaussian initial distribution lV(/x, X), where 

a\ 0 0 O' 

0 erf 0 0 

0 0 uf 0 ’ 

0 0 0 o\ 


hi 


h2 

, £ = 

h3 


_ m _ 
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with /j, = (9,9,9, 9), cr 2 = (9, 9, 9, 9). Shown specifically are the time rates of the 12 infor¬ 
mation flows: 


^2—> 1, 


^4—s-1 

^l->2, 

-^3-42) 

-^4—>2 

4 

CO 

to J 

4 

CO 

2~4—>3 

7l->4, 




The results are qualitatively the same as before in [48]. That is to say, except for T 3 ^. 2 , 
which is distinctly different from zero, all others are either negligible, or oscillatory around 
zero. But, of course, the present flows are much smaller in magnitude, in comparison to the 
one obtained before in [48] using the approximate formula. 
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FIG. 8: Information flows between the components of the 4D truncated Burgers-Hopf system in 
the invariant chaotic attractor. 
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V. STOCHASTIC MAPPING 


A. Derivation 

Consider the system 


x(t + 1) = $(x(t)) + B(x)w, (65) 

where <J> : —> M n is an n-dimensional mapping, B is an n x m matrix, and w an Tri¬ 

dimensional normally distributed random vector, representing an m-dimensional standard 
Wiener process. Without loss of generality, we assume the covariance matrix of w, X = I, 
since the perturbation amplitude can be put into B. 

In general, B may depend on x. But this complicates the derivation a lot. For simplicity, 
in this section we only consider the case when B is a constant n x m matrix. As x(r) is 
taken to x(r + 1), there exists an operator, written & : L 1 (M n ) —» L 1 (M n ), steering the 
pdf at time step r, p, to the pdf at step r + 1, &p. Since x(r) and w are independent, 
if B is a constant matrix, one may view x(r + 1) as the sum of two independent random 
variables, and then conjecture that &p be the convolution of &<&p and some joint Gaussian 
distribution. Here stands for the F-P operator associated with the mapping <f>. This is 
indeed true, as is stated in the following theorem. 


Theorem V.l 


where 


&p{y) = [ y 

J R" 


Bw) • p w (w)dw 


p w (w) = (2vr)- m / 2 (det S)“ 1/2 e -^w T x-i w 


( 66 ) 


Proof. We first assume that ( l> is invertible to make the approach more transparent to the 
reader. As always, write x(r + 1) as y to avoid confusion. Make a transformation: 


n : 


y = <f>(x) + Bw, 
z = w. 


(67) 


Its Jacobian 


Jit 


det 


d(y,z) 
(9(x, w) 


The inverse mapping is 


det 


B 


d<S> 
dx 

0 I 



= J. 




X = $ -1 (y-Bz), 
w = z. 


For any S y G M”, S z G M m , 



p yz (y,z)dydz 


/ p xw (x, w)dxdw 

'11 -HSyXSz) 

[ Pxw (n -1 (y,z)) • \J~ X \ dyd\ 

' SyXSz 


z. 


( 68 ) 


( 69 ) 



So 

Py Z { y,z) = p xw (n _1 (y,z)) • | J" 1 ! 

= p™ ($ -1 (y - Bz),z)) • |j _1 | 

= p($ -1 (y-Bz)) • |j _1 | ■ Pw ( z), 

where the independence between x and w has been used (hence p xw = p x -p w )• <^(y) = p y (y) 
is thence the marginal density by integrating out z: 

^V(y) = [ p ($ _1 (y - Bz)) • | J _1 | • p w { z) dz. 

JR" 

Since p (<h _ 1 (y)) • (J” 1 ! = <^$p(y), the theorem thus follows. 

When <f> is singular or noninvertible, let its F-P operator be then VSb G M n , S z G M m , 


p yz (y,z)dydz = 


Jn-i(s y xs z ) 

= / p w (z)dz 

Js z 

= [ p w (z)dz 


in^(s y xs z ) 
p(x) ■ p w (z)dxdz 

/ p(x)dx. 

'Q-'Sy-Bz 

/ ^*$p(x) dx 

' Sy-Bz 


p xw (x,z)dxdz 


= / p w (z)dz / ^.(y-Bz)dy. 

The conclusion follows accordingly. □ 

With the above theorem, the information flow can be easily computed. Note the theorem 
actually states that 

^p(y) = E w ^p(y - Bw) 

where E w signifies the expectation taken with respect to w. So 
fliO+l) = —E y \og(£?p)i(yi) 


(70) 


= — E r 


log/ E w ^p(y - Bw)dy 2 ...dy r 


= -E x [logE w (^p) 1 (y 1 -B 1 w)}, 

where Bi = (&n, b\ 2 , ...,&i m ) is the row vector. Likewise, 

^V(y^) = E w &^p( y^ - B^w). (71) 

In the equation, the subscript in the vector(s) and matrix means the second row is removed 
from the corresponding entities. So 

H^(t + 1) = -^log(^p)i(pi) 


= ~E X 
= —E x 


log E w 0>* p{y^ - B^w)dy z ...dy n 
J R "-2 

log£? w (^,p)i(j/i - Biw) . 


Subtract idi^(r + 1) from ddi(r + 1), and the information flow T 2 ^\ follows: 
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Theorem V.2 


T'2^\ — E x 


log£7„(^»*_p)i(yi - Biw) - E x [\ogE w (^p)i(yi - Biw)]. 


(72) 


B. Properties 

Theorem V.3 Property of causality 

For the system (65), if &i and Bi are independent of x 2 , then T 2 ^i = 0. 

Proof. As we proved for the deterministic case, if <f>i is independent of x 2 , then (^^p)i °= 
(F?^p) i- If further Bi has no dependence on x 2 , then the above Hi(r + 1) and Hi^{t + 1) 
are equal, and hence IWi = 0. □ 


C. Application: A noisy Henon map 


We now reconsider the benchmark systems that have been examined before, but with 
Gaussian noise added. The baker transformation is not appropriate here, since the added 
noise perturbation will take x outside the domain [0,1]. We hence only look at the Henon 
map <f> : M 2 —> M 2 : 


$i(x 1 ,x 2 ) — 1 + x 2 - axj, 
® 2 (xi,x 2 ) = fix 1 , 


(73) 


with parameters a, P > 0, and consider only the case T±^ 2 which has been shown as a 
benchmark case. Now perturb <f> to make a stochastic mapping: 


x(t + 1) = <h(x(r)) + Bw 


(74) 


where B = (b l3 ) is a constant matrix, w ~ iV(0,1). Let B, = (b lA , b l2 ) denote a row vector. 
It is easy to see that <f> is invertible; in fact, J = 


P 0 


$ (xi,x 2 ) = ( —, Xl - 1 + x 2 


= —ft ^ 0. The inverse is 


Thus 


%p(x 1 ,x 2 ) = p(<f> 1 (x 1 ,x 2 )) J =p 


f, + 


-1 


(75) 


(76) 


So &p{ y) = E w FP$p(y — Bw), and 

(^pMsfe) = f dy^J-p 

Jr P 


y 2 - B 2 w , « / -r x 2 

---, y 1 - B x w - 1 + — {y 2 - B 2 w) 

P P 


- pE wP 1 


y 2 - B 2 w 

P 


If x\ is frozen, $ 2 (xi,x 2 ) = px 1 is a constant. Hence H 2 \(t + 1) = 0, and 
7i—> 2 = H 2 (t + 1) — EJ 2 \{t + 1) 
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-0 


= —E 


log-E w p 1 


y 2 - B 2 w 


= log (3 — E W E X log pi 


= log ft- E W E X log pi ( x, 
= log f3 -f- dPH i. 


P > 

1)2 - B 2 W 

P 

B 2 w 


p 


(77) 


Here is the functional II \ applied by a Gaussian filter. One may understand it as H\ 

smeared out by a Gaussian filter. It is less than H\, so the noise addition makes the system 
lose some information, compared to Xi_> 2 = log f3 + Hi in the deterministic case. 


VI. CONTINUOUS-TIME STOCHASTIC SYSTEMS 

A. Derivation 

Following what we have done in section IV, we derive the information flow within a 
continuous-time stochastic system by taking the limit of the corresponding discrete stochastic 
mapping. In doing this, the results in the preceding section are ready for use. But, as noted, 
in the above derivation we have assumed a constant matrix B, a simplified case allowing 
for a clear expression of information flow. (This case does have realistic relevance, though.) 
For a time continuous system, this assumption actually can be completely relaxed. In the 
following we will see why. 

Consider a system 


dx = F(£; x.)dt + B (t; x)dw, 

(78) 

where x and F are n-dimensional vector, B is an n x m matrix, 
standard Wiener process. Note that B can be a function of both x 
equation may also be written as 

and w an m-vector of 
and time t. This above 

— = F(f; x) + B(t; x)w, 

(79) 

where w a vector of white noise, or, in component form, 


dx i „ . , ., , 

— = Fi(t] x) + B ! (t; x)w, 

(80) 

= F 2 (t; x) + B 2 (t; x)w, 

(81) 

dx n ^ , 

= F n (t\x) + B n (t;x)w. 
dt 

(82) 

(83) 

In the equations we have used B, to indicate the i-th row vector of B. Now consider (78) 
on a small interval [t,t + At]. Euler-Bernstein differencing, 

x(t + At) = x(t) + FAt + BAw. 

(84) 
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This motivates the introduction of a transformation 


/ y = x + F(x)At + B(x)Aw, 
Y z = Aw. 


(85) 


As shown in the discrete mapping case, generally this transformation cannot be inverted. But 
for this special case where At and Aw are small, the inversion can be done asymptotically. 
In fact, 


y = x + [F(y) + o(At)]At + [B(y)Aw + V(B(y)Aw)(x - y)] + o(Aw 2 ) 

= x + F(y)At + B(y)Aw + [V(BAw )](—F At — BAw) + o(At). 

Note, here the higher order terms means terms with order higher than At or (Aw) 2 — We 
will see soon that E(Aw) 2 = At. The above expansion helps invert II to 

n -i . f x = y - FA t - Bz + V(Bz)(Bz) + o(At), / g6) 

1 Aw = z. ' ' 


The following proposition hnds the Jacobian associated with the inverse transformation. 

Proposition VI. 1 Define the double dot of two dyadics A and B as A : B = £ a, 
then 


J~ l = 1 - V • FA t - V • (Bz) + -VV : (Bzz+B 1 ) + o(At). 


(87) 


Proof. By definition 

<9(x, Aw) 
. <9(y,z) 


J~ l = 


= det 


<9x <9x 

dy dz 

9Aw d Aw 


= det 


%- -b 

ay 

0 


I 


=<KI'h < 8 *> 


The key is the evaluation of det (|p 


L dy dz 

By (86), it is, up to o(At), the determinant of 


1 dyl dyi Z k + dyi ( ^l,k,s dyi Zk ^ 


db lk 

dyi 


d_ 

dyi 


dbi 

dyi 


' dyl Y,k d dyt Zk + dy n ( ^ l,k,s Z ^ 


dblk 

dy 


d_ 

dyn 


dbik 

dyi 


dF n 

dyi 


a* - E* 


d^nk 

dyi 


Zk + dyi 



db nk 

dyi 


^kbls^s 


Recall that, for an n x n matrix A = (a t j), 


1 


dF n 

dyn 


- E* 


d^nk 

dy n 


Zk + dTu 



d^nk 

dyi 


^k^ls^s 


n 

det A = ^ sgn(cr) JJ a ii<T . 

u€P n i =1 


(90) 


where P n is the totality of permutations of {1, 2,..., n). By this formula, the terms of order 
At and Aw are easy to find; they can only come from the diagonal entries. For terms of 
order (Aw) 2 , there are three sources: 

(1) the last term at each diagonal entry, together with n — 1 l’s; 

(2) multiplication of two entries at (i,i) and (j,j), i j, together with n — 2 l’s on the 
diagonal; 
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(3) similar to (2), but with entries at (i,j) and (j,i), i ^ j- 

In (3) the order between i and j switches and it has sgn = — 1 for its permutation. Except 
for (3) involving off-diagonal entries, all others are from the diagonal. So 


£ =n i-£*-e 


u / r uu ik \ 




Notice the factor 1 in the last term. Because of the symmetry between i and j and they 
repeat once when summed over i,j = 1 ,n. Thus 

( <9x \ 0 F{ . dbik 


Notice 


1 \ - \ -v db ik ~ 

2 4^ * dy t Zk 

1^=3 k 

1 \ - \ -v db ik __ 

2 Qy zk 


. V | W 9 

. 9 n ‘ u tt a V' 

■ V ^z, + o(At). 


d 2 bikb js _ db ik db js d 2 b js db js db ik 

rv r\ r'v ' '-''IK, rv rv ‘ r'v ' j 

dyidyj oyi dyj dyidyj oyi dyj 


dyidyj 


\ ^ \ ^ db dbik \ ^ \ ^ dbj S db lk \ -v \ -v . 

db[js dbi k , 1 db is db ik 1 

Fh,. Fh, ■ ZkZs + 9LL Fh,. Fh,. ZkZs + 


1 ^ ^ UVyjO UUi 

2 4" dy.i dy 

k,s z/j 


’ dyidyj 


2 tr i 


1 ^ dbjs db^ 


bEEb 


fc,s i,j 


d 2 b ik d 2 b js \ ~ 

^js rv o ”1” ®ik rv o ) ^k^S' 

dyidyj dyidyj J 


The last parenthesis holds because the two pairs (i, k ) and (j, s) may be switched under the 
summation without changing the result. Thence 


E d F-, v—v—-\ dbik 

7W A( -EEtw^ 


1 \^—> dbik dbj s 


1 \ ^ \ ^ dbjs db^ 


nEEb 


i,j k,s 


d 2 b ik . d 2 bj S \ . . . 

dsj—x -h b ik z k z s + o(At) 

dyidyj dyidyj J 


1 v-^ dFi ^ 1 ^ ^ ik z k z s bjs 

2 E—^— + °< A( > 


= 1 - V • FAf - V • (Bz) + -VV : (Bzz t B t ) + o(At), 
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which is J 1 by (88). □ 


With J 1 , we can then evaluate the operator & and hence arrive at and 'FAa 


dt 


dt 


Proposition VI.2 Let BB r = G = (gif). The time rate of change of H i is 


dHi 

dt 


= -E 


E 


d log pi 


dxi 




0ii- 


d 2 log pi 
dx\ 


(91) 


Proof. For any subset S y £ M n , S z £ 
[ Pyz(y, z)dydz = 


lU^iSyXSz) 


p xw (x, Aw )dx.ddelw 


= j p xw (y — F At — Bz + V(Bz) ■ (Bz), z) • | J 7 | dydz 

JSyXS Z 

= I p(y-FAt-Bz + V(Bz)-(Bz))|j~ 1 | • p w { z) 

JSyXSz 

because p xw ( a, b) = p x (a) • p w (b) = p(a) • p w {h) clue to the independence between x and 
Aw. Since S y and S z are arbitrarily chosen, the integrand is the very joint pdf p yz { y, z). 
Thus 

&p{ y) = Py{ y) = / Pyz{y,z) dz 

J R m 

= [ [p(y-FAt-Bz +V(Bz) • (Bz)) • jj" 1 !] • p w (z) dz 

Jr™ 

= E w {p (y - FA t - BAw + V(BAw) • (BAw)) • | J’ 1 ]} 

p(y) — Vp • (FA t + BAw + V(BAw) • (BAw)) + -(BAw)(BAw) J : VVp 

1 - V • FAt - V • (BAw) + iw : (BAwAw t B t ) + o(At) 


= E„ 


= p(y) - (F • Vp + pV • F)At 

[pVV : (BB t ) + 2Vp • [V • (BB t )] + (BB t ) : (VVp)] At + o(At) 
= p( y) - V • (Fp)At + ivv : (BB r p)At + o(At). 


(92) 


Note here the fact 


EAw = 0, EAwAw' 1 = Atl 


(93) 


about Wiener process has been used. As a verification, one may obtain from this step 


% = i im ’^diy) p(y) = _ v . (Fp ) + i vv . (bb t p) 

dt At— >-o At \ n 2 \ n 


(94) 


which is precisely the Fokker-Planck equation (cf. the appendix). 

Denote BB 7 by G. Integrate both sides of the above equation with respect to 
( 2 / 2 , 2 / 3 , ■■■Py-a) to obtain 


(^p)i(yi) = pi(yi) - At 



dFip 

dyi 


dy 2 ...dy n + 


At 

~2 


d 2 9np 

dy\ 


dy 2 ...dy n + o(At), 
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and hence 


log(^p)i(j/i) = logpi(yi) 


At r 

Pi J R "- 1 


dF lP 

dyi 


dy 2 ...dy n + 


At 

2pi 



d 2 giip 

dy\ 


dy 2 ...dy n + o(At()95) 


So 


#i(t +At) = log {&p) 1 ( 2 / 1 ) = E log pi (jji) 

as the rest two terms vanish after applying the operator E(-) — J R pi(-)dyi. Expanding y\ 
around x\, and denoting Bi = ( 611 , 612 , ..., 6 i n ), we have 


H\ (t + At) = — E log pi (. x\ + F\ At + Bi Aw) 


= —E 


logpi(xi) + d ^ gpl (FiAt + BjAw) + \ d ^|^- BiAwAw r B[ 


= H 1 (t)-E 


F\ 


dx\ 
d log Pi 

dxi 


At - E 


9ir 


d 2 logpi 
dx\ 


dx\ 

At + o(At). 


+ o(At) 


Let At —y 0 and we finally arrive at 


dHi 

dt 


-E 



d log pi 
dxi 




d 2 logpi 
dx\ 


□ 

Now consider during the time interval [t, t +At] to freeze x 2 as a parameter, and examine 
how the marginal entropy of X\ evolves. In this case we are actually considering a density 
Pi^, with rhoiv(t ) = pi(t) under an (n — l)-dimensional transformation: M n_1 —> R n_1 , 


{ Pi = xi(t + At) = xi(t) + Pi At + BiAw, 
y .3 = x 3 (t + At) = x 3 (t) + F 3 At + B 3 Aw, 

Un = X n (t + At) = x n (t) + F n At + B„Aw. 


With this system we have the following proposition. 
Proposition VI.3 Let p^ be J R p(x)dx 2 , then 


dHyy 

dt 


-E 


+E 

--E 

2 


Fi 
: 1 
Pi 


d log pi 


dxi 


- i e 
2 


dFipy 


g n- 


d 2 logpi 


dx\ 


1 

Pi 


Jn-2 dXl 

f d 2 g u p 3 

7 r „-2 dx\ 


dx 3 ...dx r 


dx 3 ... dx r 


(96) 


Proof. Following the same procedure as above, we arrive at an equation for log(t^)i(pi) 
similar to (95): 


log(^V)i(pi) = logp^(pi) 


At 

Pi's? 


dFip% 
dyi 


dy 3 ...dy n + 


At 

2pi^ 



d 2 gnp'% 

dy\ 


dy 3 ...dy n + o(At). 
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So 


H^(t + At) = -£log(£fy)i(yi) 
= -E log p^ (j/i) 

" J_ f dFip^ 

LPi^ 7k"- 2 <9yi 

"J_ f d 2 g u p 

.Pi's? Jk"- 2 


+£ 


dy 3 ...dy„ 


At 


--E 

2 


<%? 


dy 3 ...dy„ 


At + o(At). 


Note at time t, p\^ = pi, and in the last two terms y can be replaced by x with error going 
to higher order terms. Thus 


TT^(t + At) = -S 

i r 


logpi(xi) + 


ai 0 gP i(F 1 Ai + B 1 Aw)+ 1S2 ‘° gft " 


+£ 

2 


dFip^ 


dx\ 


2 ctxf 


-BiAwAw^ B( 


Pi JR"- 2 <9^i 

J_ f d 2 gnp\ 

pi J R n -2 


dx 3 ...dx r 


At 


dx 3 ...dx r 


= H 1 (t)-E 

“i r 


F\ 


d log pi 


+£ 


dx\ 

dFip's, 


At - E 

2 


At + o(At) 

<9 2 log pi 
eta; 2 


At 


-is 

2 

Take the limit 


dxi 

_ d 2 guP\ 

Pi 7 r »-2 <9:r 2 


|_Pi Jr »- 2 
1 /• 


dx 3 ...dx r 


At 


dx 3 ...dx r 


At + o(At). 


iWt + At) -ifi(t) 
—7— = hrn --- 

dt At->o At 


and we arrive at the conclusion. □ 

Theorem VI.1 

1 f dFip'y 


T 2 ->i — —E 


Pi J R"-2 dxi 


dx 3 ...dx r 


+ \ E 


J_ f d 2 gnp^ 
Pi J R n -2 dx\ 


dx 3 ...dx r 


= -J P2 l.felxO^Adx+i 


P2|l(S2|ttl) ^ rfx - 


Proof. Subtract (96) from (91) and the conclusion follows. □ 


(97) 

(98) 


B. Properties 

Theorem VI.2 For a 2D system 


dHiy 

dt 


= E 


dFi 
dx i 


(99) 

( 100 ) 


in the absence of stochasticity. 
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Remark: This recovers the heuristic argument by Liang and Kleeman in [45]; see Eq. (5). 
Proof. In this case gn = 0, p^ = p\, so 


(IHyy 

dt 


E 

E 


- j. j L' x 

p\ QF\ aiogpi 

— tt- V r i—--ri 

Pl OX 1 

'dF\\ 

v dx\ ) ' 


(9a:i 


o log Pi 
<9x. 


□ 


Theorem VI.3 If gn = Y^k =i bikbik is independent of X 2 , the resulting T 2 ->i has a form 
same as its deterministic counterpart. 

Proof. If gn is independent of x 2 , so is f 9 dx :i ...dx n . Hence the integration can be 
simplified: 



d 2 gnp $ 
dx\ 


dx = 





d 2 gnp\> 

dx\ 


dxidxz...dx n = 


d 2 gnp'% 

dx\ 


dxidx3...dx n 


0. 


□ 


Theorem VI.4 (Property of causality) 

If both F\ and gn are independent of x 2 , then T- 2 ^i = 0. 


Proof. As proved above, when g n has no dependence on x 2 , the last term of X^i becomes 
zero. If, moreover, F\ does not depend on x 2 , then d Q^ does not, either. So the integration 
with respect to x 2 can be taken inside directly to pi 2 /pi = p 2 \i(x 2 \xi)\ 


T 2 ^ i = - dx i p 2 \i{x 2 \xi)dx 2 



dF\p^ 

dxi 


dxs...dx n 



dF\ p\> 
dxi 


dxs...dx n 


0. 


□ 


C. Application: A stochastic gradient system 

We are about to study the information flow within a system which has a drift function 
in the gradient form. We particularly want to understand how stochastic perturbation may 
exert influence on the flow. The gradient systems are chosen because their corresponding 
Fokker-Planck equation admit explicit equilibrium solutions, i.e., solutions of the Boltzmann 
type. To see this, let 


F = - W, 


( 101 ) 
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where V = H(x) is the potential function. For simplicity, suppose that the stochastic 
perturbation amplitude B = 61 where / is the identity matrix and 6 = const. Hence 
G = BB t = g I, and g = b 2 is a constant. It is trivial to verify that 



( 102 ) 


where Z is the normalizer (or partition function as is called in statistical physics), solves 

v • (pF) = Ijvy, 

the equilibrium density equation for the system 


dx = —Welt + 6Idw. (103) 

As an example, consider the potential function 

V = \( x ixl + x 2 2 x 2 3 + xl + x 2 2 + xl). (104) 

This system, though simple, results in a compactly supported density function, while allow¬ 
ing for asymmetric nonlinear interactions among aq, x 2 , and x 3 . The resulting vector held 
is 


F\ = —x\x\ — aq, 

F 2 = -x 2 x\ - x 2 x\ - x 2 , 

F-i = -x 3 x\ - x 3 . 

Obviously, T 3 ^ 1 = T\^ 3 = 0 by the theorem on causality property. The general how from 
Xj to Xi is 



The computation seems to be easy, but by no means trivial. The difficulty comes from the 
evaluation of the conditional density pj\i(xj\xi). Theoretically this is not a problem, but 
in realizing the computation we have to consider the problem on a limited domain, which 
may not effectively cover the support of the density function. Here we choose a domain 
[—5,5] x [—5,5] x [—5,5], and a spacing size Ax = 0.05. The computation is implemented 
henceforth. 

To test how the stochastic perturbation may affect the information how. tune 6 to see 
the response. The tuning range is rather limited, though, with the present computational 
domain. Shown in Fig. 9 are the results. As expected, T 3 _>i and T\_> 3 are identically 
zero. For others, the how rates generally increase with 6. That is to say, they tend to 
increase the uncertainty of their corresponding target components. This makes sense, since g 
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FIG. 9: Information flow within a gradient system with the potential function (104). 


functions like temperature in thermodynamics, and increase in T surely will lead to increase 
in uncertainty. If examining more carefully, one finds that the increase is actually not 
symmetric. Those going to x 2 (T 3 _>. 2 and T\^ 2 ) are faster than those leaving x 2 {T 2 ^ 1 and 
T 2 _> 3 ), reflecting the property of asymmetry of information flow. 

Since p can be accurately obtained, this example can be utilized to validate our numerical 
computations for more general cases. 


VII. LINEAR STOCHASTIC SYSTEMS 

As always, it would be of interest to look at the particular case, namely, the case with 
linear systems: 


dx = A xdt + Bdw, 


( 106 ) 
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with A and B being constant matrices. If originally x is normally distributed, then it is 
normal/Gaussian forever. Let its mean vector be p and its covariance matrix be X. Then 


dp 

dt 

dX 

dt 


Ap, 

AX + XA r + BB t . 


(107) 

(108) 


In component form p = (pi,p n ) T , X = (cpj) nxn , and BB^ has been denoted by G in 
the above. The distribution is, therefore, 


P 


_ e --|( x -A r £ 1 (*-/*). 

yj (27r) n det X 


We need to find 

Pi, Pi-2, P'q 

, and the following facts will help. 

Fact 1: p^ is a multivariate Gaussian N(p X^) where py = (pi, p 3 , p 4 ,..., p n ) n , and X^ 
is the covariance matrix of (x 4 ,x 3 ,x 4 , ...,x n ) n . 

Fact 2: The conditional probability density function p 2 |i is 


in other words, 


P2\l 


^[o;2-M2-^Gi-mi)] 2 

5 

(109) 

042/ 1 Ai 2 \ 

(Xi Pi), 

04 1 cr 11 / 

(110) 


In the above equations, we have used, and will be using, A,, to shorten det 


a ij a jj 


We now compute the information flow T 2 _>. 4 . Since B is constant (hence independent of 
x 4 ), the stochastic term vanishes by Theorem VI.3. So we need only consider its deterministic 
part: 


T-2->\ — —E 


1 

Pi 


dF\p% 

dx\ 


= ~ f p 2 \i(x 2 \xi) d ^ lP ^ dx. 

Jr” ox i 


As a starting point, let us consider the case n — 3. By the proposition above, 


So 


P'S? - Pl3 - 


_ — b'SaG 1 “in ) 2+<T11 Gs-w) 2 ]-2(713 Gl-Ml)G3-£t3)] 


V(2tt) 2 A 13 


dFip^ 

dx\ 


dx 3 = / pi 3 {an + [043(x 3 - p 3 ) - cr 33 (xi - pi)] ■ (a n xi + ai 2 x 2 + ai 3 x 3 )/Ai 3 }dx 3 


^13/1.3 + 033(24 — ^i)(<Jnb + 01224) 

= OnPi- 7 -Pi 


A 


13 
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P 13 • \ai 3 (J 13 xl + (a n x 1 + ai 2 a; 2 )o-i 3 a: 3 - ( 0 - 13 p 3 + o- 33 (xi - /ii))ai 3 x 3 ] dx 3 . 
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We need to find f R X 3 prsdx 3 and J R x§pi 3 <ix 3 . Since (xi,X 3 ) is a bivariate Gaussian, 


x 3 |xi ~ N ( /i 3 + — (xi - /h), — 

(7ll <7ll 


we thence have 


Pi3X 3 dx 3 = Pi J P'i\\ x :idx3 = pi ■ (p 3 + ^-(xi~ Pi)^j , 


Pl3x\dX3 = Pi P3\ixldx 3 = Pi 


A 


13 


<711 


<713 , 


+ p3 “t-(xi ~ Pi) 


cr n 


Substituting back, we obtain: 


r dFip% 
Jr dxi 


dx 3 


< 7 i 3 p 3 + <733(^1 — Pi)(auXi + 012X2) 

°npi-x-P 1 

^13 


+<7l3<7l3 



P 3 H--(aii — Pl) 

<7ll 



Pl 

A 13 


+ [(onXi + 012X2)1713 — (<713/13 + <733 (xi — /ii))ai3] 


p3 H -(Xl — Pl) 

cr 11 


Pi 

^13 


Thus 

rji _ ^ 1 -9F1P13 

12->1 — — X/---0X3 

Pl <7X1 

1 r 

— On — —-[—<713/13011/11 — <713/13012/12 — Oii<T 33 <Th — Oi2<733<Ti2 

^13 

A 13 / 2 2/2 \ 

TO13CT13-h Oi3<Ti3(/I3 + <7i 3 /<7 11 '<711)+ 011013/13/ii 

On 

+Ol2P3<7l3p2 — Oi3<Ti3/l3 — 0 + Oii<Ti 3 /<Tii • <7n 

+<7l2<7i 3 /<7n ' <712 — 0 — Oi3<733O13/CTn • On] 

<712 

— Oi2-• 

<7ll 

The so many terms are canceled out, and the result turn out to be precisely the same as 
that for the 2D case we have derived before ever since Liang and Kleeman (2005)! 

The above remarkably concise formula actually holds for systems of arbitrary dimension¬ 
ality. This makes the following theorem: 

Theorem VII. 1 If an n-dimensional (n > 2) vector of random variables (xi,...,x n ) T 
evolves subject to the linear system 


dx = A xdt + Bdw, 


where A = (a t j) and B are constant matrices, and if its covariance matrix is (cr if), then the 
information flow from x 3 to Xj is 


for any i,j = 1, i j. 


a. 


T. — n.. Ill 
- L j—¥i 5 


(Tv. 


(in) 
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Proof. It suffices to prove the case (i,j) = (1,2); if not, we may always reorder the com¬ 
ponents to make them so. We prove by induction. The 3D case has just been shown above. 
Now suppose (111) holds for n-dimensional systems. Consider an n+l-dimensional system 


dx i 
dt 


n 

^ jXj T (^l,n+l-^n+l i 

3 = 1 


dx 2 

dt 


y ^ Cl n jXj + ®n,n+l+n+l 


3 =1 


dx. 


n+l 


dt 


y ^ Q'n+l,j3'j T ®n-f l,n+l*Di+l • 

i=i 


To distinguish, we now use p n to denote the joint density for the n-dimensional system. The 
information flow from X 2 to X\ is 


To—u — 


/ I s dF l 

p2\l{X2\Xl)— -ax 

ln+1 OX l 


d 

ftiiwr 

£n+l OX l 


( ^ ( T (Ogn+l+n+OP^ 


3 = 1 


dx 




d 


P2\1T> - (ogn+l^n+lP^) Tx. 

n+l OX\ 


Note the hrst term results from integration with respect x n+ \ , since all the variables except 
are independent of x n+ \. This is precisely the information flow from x% to X\ for an 
n-dimensional system; by our assumption it is ai 2 <Ti 2 /<7n. For the second term, note that 
all variables, except p 2 p, are independent of a; 2 , so we may take integral with respect to 
X 2 directly inside with p 2 \\. But / R p 2 |iG+ 2 = 1, so the second term results in the integral 
of Jh-(ai n+1 a; n+ ip^) which vanishes by the compactness of p. Therefore (111) holds for 
n+1-dimensional systems. By induction, it holds for systems of arbitrary dimensionality. □ 


Let us see an example: A = 


'1-20' 


'l 0 o' 

1 0 -5 

, and B = 

0 2 0 

1 

to 

1 


0 0 3 


. In component form, 


the equation is 


( 112 ) 

(113) 

(114) 


The evolution of the covariance matrix C is governed by 

^ = AC + CA r + BB r . 

dt 


dx i 
dt 
dx2 
dt 

dx 3 

dt 


KAJiAJ ^ 

— = Xx - 2 x 2 + 0x 3 + Wi, 
Xi + 0 x 2 - 5^3 + 2 w 2 , 
—xi + 2 x 2 - x 3 + 3 + 3 . 
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( 115 ) 



Let it be initialized by 


. The solution is shown in Fig. 10. The rates of information 


1 0 0 
0 4 0 
0 0 9 


x 10 9 


ii 


x 10® 


12 


x 10 9 
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FIG. 10: Covariance evolution with the linear system (112)-(114). 


flow are subsequently obtained and plotted in Fig. 11. Among them, T^i = 0, just as 
expected by the property of causality. T 3 _>, 2 and T 2 _> 3 oscillate around a value near zero, and 
2~2—s-l oscillates around -0.9. The remaining transfers, 7\_> 2 and Tj_> 3 , albeit still oscillatory, 
approximately approach two constant values. The former approaches 0.16, while the latter 
approaches 1. 


VIII. SUMMARY 

Information flow, or information transfer as it may appear in the literature, is a funda¬ 
mental notion in general physics which has wide applications in different disciplines. In this 
study we have shown that, within the framework of dynamical systems, it can be rigorously 
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FIG. 11: As Fig. 10, but for rates of information. 


derived from first principles. That is to say, it is a notion ab initio, quite different from 
the existing axiomatic postulates or empirical proposals. In this light we have studied the 
information flow for both time-discrete and time-continuous differentiable vector fields in 
both deterministic and stochastic settings. In a nutshell, the results can be summarized as 
follows. 

Consider an n-dimensional state variable x = (x\,X 2 , ■■■x n ), the corresponding probability 
density function (pdf) being p(x\, X 2 , ■■■x n ), and the marginal pdf of Xi being p im For a 
deterministic mapping $ : —» M n , 

x(r) x(t + 1) = ($i(x), $ 2 (x), ...$„(x)), 

the rate of information flowing from a; 2 to X\ proves to be 

T 2-+1 = Elog(^p)i($i(x)) - £log(^V)i(<I>(i(x)), 

where E is the mathematical expectation with respect to x, g? the Frobenius-Perron oper¬ 
ator of <F, and g?^ the same operator of $ but with x 2 frozen as a parameter (so (^^)i(xi) 
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has dependence on x 2 ). The units are in nats per unit time; same below. If the system is 
continuous in time, i.e., 


dx 

* =F,x - t) ' 


then 


T 2-+1 — — 


d P \ F 1 , p 

' ,2|1 ^T dx= 


J_ f dp^F x 

[p\ Jm.n-2 dx\ 


dxs...dx r 


where py = f R p(xi,X 2 , ...,x n )dx 2 , and p 2 |i is the conditional pdf of X 2 on x\. When stochas- 
ticity comes in, in the discrete mapping case: 

x(r + 1) = $(x(t))) + B(x)w, 

where <f> : M ra —> M n is an n-dimensional mapping, B an n x m constant matrix, and w an 
m-dimensional standard Wiener process, then 


T 2 ^i = E x [logE w (3*^p)i(yi - Bpw)] - E x [logE w (^p)i(yi - Biw)], 

with Bi = (6n, 612, ..., bi m ) a row vector of the matrix B. Here we use E x and E w to indicate 
that the expectation is taken with respect to x and w, respectively. If what we consider is 
a continuous-time stochastic system, i.e., a system as 


dx = F(x, t)dt + Bdw, 


or alternatively written as 

dx „ 

- = F(x,t) + Bw, 

where w is the white noise, then the result can be explicitly evaluated: 

(116) 
(117) 

where gn = Y2j =1 bijbij. Note the first term is just from the deterministic vector field, while 
the second the contribution from the noise. It has been proved that, if b\ ] has no dependence 
on X 2 , then the stochastic contribution vanishes, making the information flow same in form 
as that from its deterministic counterpart. We have particularly examined the case F = Ax, 
i.e., the case when the system is linear and autonomous, 

dx = A xdf + Bdw 


d~2^1 — — E 


1 

LPi 


dFip'g 

dxi 


dx 2 ,-..dx r . 


+ \ E 


P2|i(z 2 |zi)^^dx+^ 


J_ f d 2 g n py 

pi d R n -2 dx\ 

, 1 i d2 9u P't 

P2\i(x 2 \xi) 


dx:i...dx r 


dx, 


with A = (dij) nxn and B = ( bij) nxm being constant matrices, then the information flow 
from Xj to X{ is remarkably simple: 

_ (J jj 

r T 1 . . — n . ._ — 

J-j—ti U'lJ 1 


45 



for any 1 < i,j < n, i ^ j. This result is precisely the same in form as originally we 

obtained for 2D deterministic systems based on intuitive arguments in [45]. 

Historically it has been a long-time endeavor to relate information flow to causality. We 
want specifically to have that, if ^ 0, then x 3 causes x*, otherwise x 3 is not causal. 
With the existing empirical/half-empirical measures for information flow, such as the widely 
used transfer entropy, the endeavor has been fruitful for some problems but unsuccessful 
for others (e.g., [44]), and the inconsistency has even led to doubt about the association 
between information flow and causality (e.g., [38]). In this study, the implied causality, the 
touchstone one-way causality in particular, is a proved fact for dynamical systems, as stated 
in various theorems. More specifically, when the evolution of x % does not depend on x 3 , then 
Tj^i = 0. This is particularly clear in the above linear case, the dependence of Xi on x 3 
is from the entry a %3 of A, so when it is zero, then x 3 is not causal to x % . This result also 
quantitatively, and unambiguously, tells us that, causation implies correlation, but not vice 
versa, resolving the long-standing debate over correlation versus causation. 

The above results have been put to applications with a variety of benchmark systems. 
Particularly we have re-examined the baker transformation, Henon map, and truncated 
Burgers-Hopf system. The results are qualitatively similar to what we have obtained before 
using an approximate formalism, but with magnitudes significantly smaller. Also shown are 
the information flows within a Kaplan-Yorke map, a noisy Henon map, a Rbssler system, 
and a stochastic gradient flow. We look forward to more applications to real world problems 
in the near future. 
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